Overlap of vitamin A and vitamin D target genes with CAKUT-related processes [version 2; peer review: 2 approved, 2 approved with reservations]

Congenital Anomalies of the Kidney and Urinary Tract (CAKUT) are a group of abnormalities affecting the kidneys and their outflow tracts. CAKUT patients display a large clinical variability as well as a complex aetiology. Only 5% to 20% of the cases have a monogenic origin. It is thereby suspected that interactions of both genetic and environmental factors contribute to the disease. Vitamins are among the environmental factors that are considered for CAKUT aetiology. In this study, we aimed to investigate whether vitamin A or vitamin D could have a role in CAKUT aetiology. For this purpose we collected vitamin A and vitamin D target genes and computed their overlap with CAKUT-related gene sets. We observed limited overlap between vitamin D targets and CAKUT-related gene sets. We however observed that vitamin A target genes significantly overlap with multiple CAKUT-related gene sets, including CAKUT causal and differentially expressed genes, and genes involved in renal system development. Overall, these results indicate that an excess or deficiency of vitamin A might be relevant to a broad range of urogenital abnormalities. A of of of of the of outgrowth of (HNF1B) is involved bud branching and induction of is required for S-shaped bodies patterning and subsequent morphogenesis of all nephron segments. (NRIP1) the activity of nuclear has been shown to play in of This study involves generating curated vitamin A and D gene sets from the public domain and assessing the relevance of the genes to CAKUT and CAKUT related pathways. The authors meticulously report their results and curation efforts so the findings are readily reproducible and are appropriate for F1000Research. The paper by Ozisik et al investigates associations of Vitamin A and D target genes with renal development and CAKUT pathways. They find that a significant overlap for vitamin A target genes but less so for vitamin D target genes. This fits with previous epidemiological studies on vitamin A and renal size. Overall, I find the paper interesting because it provides insights into non-genetic causes of CAKUT. I have only a few minor points: 263 "The increased availability of multi-omics technologies, such as whole genome sequencing and metabolomics will certainly open new directions, e.g. by revealing new causal genes or identifying metabolic disturbances due to environmental factors." The manuscript "Overlap if vitamin A and vitamin D target genes with CAKUT-related processes" is aimed to indirectly evaluate the possible relationship between Vitamin A and D intake imbalance and congenital anomalies of the kidney and urinary tract (CAKUT). The work considers genes reported as dysregulated in CAKUT and key genes involved in different urinary morphogenetic pathways. Both were overlapped with target genes of Vitamin A and Vitamin D (extracted from CTD and from literature: Balmer and Blomhoff, 2002 for Vitamin A, Ramagopalan et al ., 2010 for Vitamin D). As expected considering the biological activity of the Vitamin A morphogenetic derivative (retinoid acid), significant overlaps are reported in the manuscript between Vitamin A target genes and genes involved in several urinary morphogenetic processes and signalling. Interestingly, significant overlaps were observed between Vitamin A target genes and CAKUT-related genes. As far as Vitamin D is concerned, overlappings were limited to some kidney morphogenetic genes but no overlapping was evidenced with CAKUT-related genes. In general, this work can be considered of interest in order to postulate a possible relationship between Vitamin imbalance (mainly Vitamin A) and CAKUT and to suggest that a nutritional evaluation in pregnancy is essential to ensure a physiological development and maternal/fetal health. suggestions to improve the impact of the manuscript: Abstract: The abstract should be rewritten in order to better clarify aim and results Abstract The abstract should be rewritten in order to better clarify aim and results of the work. In particular, Vitamin D results are lacking. We revised the abstract; we clarified our aim and added the results for vitamin D. “...In this study, we aimed to investigate whether vitamin A or vitamin D could have a role in CAKUT aetiology. For this purpose we collected vitamin A and vitamin D target genes and computed their overlap with CAKUT-related gene sets. We observed limited overlap between vitamin D targets and CAKUT-related gene sets. We however observed that vitamin A target genes significantly overlap with multiple CAKUT-related gene sets, including CAKUT causal and differentially expressed genes, and genes involved in renal system development. Overall, these results indicate that an excess or deficiency of vitamin A might be relevant to a broad range of urogenital abnormalities.” ectopic budding of the ureteric tips and promotes elongation of the ureter. 49 RET is a tyrosine-protein kinase receptor, involved in a large variety of cellular processes. RET signaling is important, in particular, for the terminal growth of the nephric duct, and for initial formation and outgrowth of the ureteric bud. 50 The Hepatocyte Nuclear Factor 1-beta (HNF1B) is a transcription factor involved in ureteric bud branching and induction of nephrogenesis. It is required for normal S-shaped bodies patterning and subsequent morphogenesis of all nephron segments. 51 Finally, Nuclear receptor interacting protein 1 (NRIP1) modulates the activity of nuclear receptors. It has been shown to play roles in renal malformations via deregulations of retinoic acid signaling. 52 ”


Introduction
Congenital Anomalies of the Kidney and Urinary Tract (CAKUT) are a group of abnormalities affecting the kidneys and their outflow tracts, including the ureters, the bladder, and the urethra. In the European Union, the overall prevalence of CAKUT (in live plus stillbirths) between 2011 and 2018 was approximately 35:10,000. 1 The main anomalies observed in CAKUT are hydronephrosis (13.02:10,000) and multicystic renal dysplasia (4.33:10,000); these are followed by posterior urethral valves (1.26:10,000), bilateral renal agenesis (1.27:10,000) and bladder exstrophy/epispadia (0.62:10,000); many other anomalies with low prevalence can also be observed. 1 This clinical variability observed in patients is accompanied by a variable severity of the phenotypes. 2 Approximately 40 different genes are known to be associated with monogenic causes of CAKUT in humans, but they explain only 5% to 20% of the cases. 3,4 The clinical variability and the complex aetiology of CAKUT cases suggest a multifactorial origin with complex interactions of both genetic and environmental factors contributing to the disease. 2,3,5 The role of different environmental factors in CAKUT pathogenesis has been studied previously. It has been shown, for instance, that the drugs inhibiting the angiotensin-converting enzymes cause a specific form of CAKUT, namely tubular dysgenesis. 6 Many prenatal and maternal factors were also assessed for their involvement in CAKUT and related anomalies. For example, studies have observed CAKUT associations with pregestational maternal diabetes mellitus, 7,8,9 gestational maternal diabetes mellitus, 8,10,11 abnormal volume of amniotic fluid, 7,10 urogenital infections before the pregnancy, 12 any infection during the pregnancy, 12 maternal overweight and obesity, 7,8,11,13 and lower infant birth weight. 7,8,9,10 These maternal and prenatal factors are modulated by environmental risk factors. In particular, maternal diabetes and overweight/ obesity as well as low birth weight are related to the nutrition of the maternal-fetal dyad. In this regard, trace nutrients, like vitamins, are likely to play important roles that still await full characterization. 14 Vitamins are known to regulate renal development. 15,16,17,18 The association of maternal vitamin A deficiency with nephron reduction was studied on a rat model. 19 The inactivation of the retinoic acid nuclear receptors also resulted in renal malformations in mice. 20 An additional study on rats showed that vitamin A deficiency downregulates RET expression which is essential for epithelial-mesenchymal interaction during renal development. 21 Goodyear et al. compared a group of pregnant women in Bangalore with a high (55%) prevalence of vitamin A deficiency with a group of pregnant women in Montreal with negligible vitamin A deficiency. 22 The authors found that renal volume in newborns, adjusted for body surface area, was significantly lower in the vitamin A-deficient population. While the two populations are different for other genetic and environmental reasons, these results suggest that vitamin A deficiency may have a role in human CAKUT. El-Khashab et al. reported a similar finding in their study on a cohort of Egyptian mother-child pairs; 23 children of vitamin A deficient mothers had significantly lower kidney sizes.
The research on the effects of vitamin D deficiency on kidney development has generated interesting results. Rogers et al. 24 observed that incubation of metanephroi with vitamin D3 prior to implantation into adult rats increased the number of glomeruli. Conversely, Maka et al. 25 and Nascimento et al. 26 observed that maternal vitamin D deficiency stimulated nephrogenesis in rats. Following these studies, Boyce et al. examined the long term effects of maternal vitamin D REVISED Amendments from Version 1 In line with the comments from the reviewers, we made the following new analyses and changes: In the previous version of the manuscript, we presented the overlap analysis of vitamin A and vitamin D target genes with 13 CAKUT-related gene sets. In this revised version, we added a set of CAKUT differentially expressed genes to the analyses. We hence repeated the hypergeometric test for 14 gene sets. Due to the multiple testing correction, previous adjusted p-values slightly changed.
We added explanations for the statistical test, the null hypothesis, and the background sets used in the analyses.
As a supplementary analysis, we calculated the overlap statistics with randomized sampling (in addition to the hypergeometric test in the main manuscript).
As a supplementary file, we provide the overlap of CAKUT causal genes with other CAKUT-related gene sets without considering being a vitamin target, for the interested readers.
We added descriptions for the roles of BMP4, HNF1B, RET and NRIP1, which are CAKUT causal genes and vitamin A targets according to both CTD and Balmer and Blomhoff (2002).
Any further responses from the reviewers can be found at the end of the article deficiency and observed that adult male rat offspring of vitamin D deficient dams had reduced creatinine clearance, indicating reduced renal functional capacity. 27 They also observed a significant upregulation of renal renin mRNA expression in fetuses and adult male offspring, and reported smaller kidneys in the offspring. Miliku et al. examined the association of 25-hydroxyvitamin D levels during mid-pregnancy with childhood kidney outcomes among 4212 motherchild pairs. 28 They observed that children of mothers who were vitamin D deficient during pregnancy had a larger combined kidney volume compared to children of mothers who had optimal vitamin D levels.
In this study, we examined whether vitamin A and vitamin D target genes are related to gene sets involved in CAKUT and renal system development. Our aim is to generate hypotheses for the involvement of these vitamins in the disease aetiology. We first created a list of vitamin A and vitamin D target genes, extracting information from the Comparative Toxicogenomics Database (CTD) 29 as well as from two publications. 30,31 We then constructed different gene sets relevant to CAKUT, including genes mutated in monogenic forms of CAKUT, genes involved in renal system development, and genes involved in different pathways of interest. Finally, we performed overlap analyses to identify the genes involved in these different gene sets as well as the significant overlaps.

Vitamin A target genes
We first queried vitamin A and downloaded all gene interactions of vitamin A and its descendants from the Comparative Toxicogenomics Database (CTD). 29 We selected only the interactions supported by at least two references. This gave a list of 1086 target genes.
As an independent source, we used the data from the study of Balmer and Blomhoff. 30 In this study, the authors reviewed published data from 1191 articles to identify genes regulated by retinoic acid (the active metabolite of vitamin A) in humans and other species. Only a small subset of these articles are part of the CTD curation. The authors provide a list of retinoic acid target genes split into four categories, ranging from strong evidence to indirect regulation through a transcriptional intermediary. We selected the genes from all of these categories, converted gene symbols to human orthologs, and updated gene symbols when necessary using the HUGO Gene Nomenclature Committee (HGNC), Rat Genome Database (RGD), Mouse Genome Database (MGD), BioMart and SynGo. The final list of genes obtained from Balmer and Blomhoff contains 521 target genes, of which 229 are common with genes from CTD.

Vitamin D target genes
We queried vitamin D and downloaded all gene interactions of vitamin D and its descendants from the CTD. We selected only the interactions supported by at least two references, and obtained a list of 263 target genes.
Ramagopalan et al. 31 identified 230 genes with significant expression changes in response to vitamin D on lymphoblastoid cell lines using microarrays. This publication is not part of the CTD curation, hence it brings additional and independent information. We checked the gene names using HGNC and obtained a list of 210 target genes. There are only 15 common genes between the list from CTD and the list from Ramagopalan et al. The combined list of vitamin D target genes (458 genes) has 134 genes in common with the combined list of vitamin A target genes (1378 genes).

CAKUT-related gene sets
We collected complementary gene sets relevant to CAKUT and renal system development from several sources, outlined below.

CAKUT causal genes set
First, we created a set of genes known to be mutated in the monogenic form of CAKUT. To do so, we combined two lists of genes provided in the studies of van der Ven et al. 3,4 Only six genes are different between these two lists, and their union leads to 42 causal genes (see Extended data 32 ).

CAKUT differentially expressed genes set
Jovanovic et al. 33 identified 27 upregulated and 51 downregulated genes as a result of the transcriptome profiling of 15 CAKUT and 7 control ureter samples. We retrieved these differentially expressed genes (DEGs) from the supplementary file they provided. We updated gene names with the help of HUGO Gene Nomenclature Committee (HGNC) and obtained a final list of 74 DEGs (see Extended data 32 ).

Gene Ontology kidney development gene sets
We extracted the genes annotated with the three terms below from Gene Ontology (GO). 34,35 Please note that these terms have parent-child relationships.

Reactome pathways
We selected different pathways of interest for CAKUT. The renin-angiotensin system is essential for kidney development and mutations in the genes of this system result in CAKUT. 36 In addition, Davis et al. discussed the role of RET signaling in kidney development and CAKUT. 37 In multiple studies 38-43 the roles of WNT, NOTCH, and Hedgehog signaling in kidney development and kidney diseases are discussed. Based on these studies we selected the following Reactome 44 pathways: • R-HSA-2022377 Metabolism of angiotensinogen to angiotensins (18 genes) • R-HSA-8853659 RET signaling (38 genes) • R-HSA-195721 Signaling by WNT (328 genes) • R-HSA-157118 Signaling by NOTCH (233 genes) • R-HSA-5358351 Signaling by Hedgehog (149 genes)

Overlap analyses
We computed the overlaps between vitamin target gene sets and the CAKUT gene sets of interest defined previously. After obtaining the overlap results, we calculated the significance of the overlaps using the hypergeometric test. Hypergeometric test calculates the statistical significance of getting k successes in n draws without replacement from a finite population of size N that contains a total of K successes. In our context, k is the number of genes overlapping with a specific gene set, K is the size of that gene set, n is the number of vitamin targets, and N is the background gene set (also known as the reference set). We set N to the number of annotated genes in the database from which we obtained the tested gene set. For the CAKUT causal genes set and the differentially expressed genes set, we used the largest N among the different databases, which corresponds to Gene Ontology. The null hypothesis is that the vitamin target genes are not associated with the gene set and the selection of k genes from that gene set is just a result of simple random sampling.
We performed a hypergeometric test for all the gene sets. A consequence of multiple testing is false discoveries in which null hypothesis is rejected erroneously. To control for the false discovery rate, we used the Benjamini-Hochberg correction method (BH adjusted).

Supplementary analyses
We performed two supplementary analyses.
• We calculated the overlap statistics with a randomized sampling approach instead of hypergeometric test as a confirmation • We calculated the overlaps of the CAKUT causal genes set with the other CAKUT-related gene sets (without considering being a vitamin target) The details of the supplementary analyses and results are provided in Extended data. 32

Results and discussion
The results of the overlap analyses between vitamin A and D target genes and CAKUT-related gene sets are presented in Tables 1 and 2, and the corresponding data is available in Underlying data. 32 Vitamin A target genes and CAKUT-related gene sets show significant overlaps Vitamin A target genes obtained from both CTD and Balmer and Blomhoff are significantly enriched in the set of CAKUT causal genes, with an overlap of ten and six genes, respectively (Table 1). Vitamin A target genes obtained from CTD are also significantly enriched in the set of CAKUT differentially expressed genes (DEGs).
Significant overlaps are observed focusing on GO terms related to renal system development, kidney development, and kidney morphogenesis. These results are expected due to the recognized role of vitamin A in differentiation. 46,47 It should be noted that all overlapping CAKUT causal genes except NRIP1 are also part of the kidney development GO term.
Four genes (BMP4, HNF1B, RET, NRIP1) appear as particularly interesting as they are CAKUT causal genes and vitamin A targets according to both CTD and Balmer and Blomhoff. BMP4 is a growth factor of the TGF-beta family involved in a wide variety of developmental processes. 48 BMP4 inhibits ectopic budding of the ureteric tips and promotes elongation of the ureter. 49 RET is a tyrosine-protein kinase receptor, involved in a large variety of cellular processes. RET signaling is important, in particular, for the terminal growth of the nephric duct, and for initial formation and outgrowth of the ureteric bud. 50 The Hepatocyte Nuclear Factor 1-beta (HNF1B) is a transcription factor involved in ureteric bud branching and induction of nephrogenesis. It is required for normal S-shaped bodies patterning and subsequent morphogenesis of all nephron segments. 51 Finally, Nuclear receptor interacting protein 1 (NRIP1) modulates the activity of nuclear receptors. It has been shown to play roles in renal malformations via deregulations of retinoic acid signaling. 52 Concerning the overlap with Reactome pathways, the vitamin A target genes obtained from CTD are significantly enriched in signaling by NOTCH. The NOTCH signaling pathway is mainly involved in cell-to-cell communication, 53 and it plays a role in the development of most organs and tissues. 54 Other Reactome terms also have overlapping genes, even if not significant. This includes the WNT signaling pathway (which is involved in developmental processes such as patterning, differentiation/ proliferation shift, and migration) and the metabolism of angiotensinogen to angiotensins pathway, from which CTSD, CTSG and MME are vitamin A targets and play roles in the generation of different angiotensin peptides. 55,56,57 The impaired conversion of angiotensin I to angiotensin II is the pharmacodynamic adverse mechanism underlying the fetopathy caused by ACE-inhibiting drugs. 58 One main feature of this teratogen-induced fetopathy, namely renal tubular dysgenesis, belongs to the CAKUT spectrum. In addition, an overproduction of angiotensin II might have a role in CAKUT anomalies involving renal tubules, as observed in cases of the autosomal recessive polycystic kidney disease. 59 Finally, vitamin A target genes extracted from both CTD and Balmer and Blomhoff are significantly enriched in the development of ureteric collection system, genes controlling nephrogenesis, and GDNF/RET signaling axis pathways, as defined in WikiPathways. The target genes extracted from CTD are also significantly enriched in nephrogenesis. Signaling by GDNF through the RET receptor is required for normal growth of the ureteric bud. 60 Localized expression of GDNF by the metanephric mesenchyme is important to elicit and correctly position the initial budding event from the Wolffian duct and to promote the continued branching of the ureteric bud.
Overall, the significant overlaps observed between vitamin A and the CAKUT gene sets indicate that vitamin A might be relevant to a broad range of urogenital abnormalities.
Vitamin D target genes overlap with renal system development In the analyses of vitamin D target gene sets, we overall observed very few significant overlaps with CAKUT-related gene sets. Vitamin D target genes obtained from CTD are only significantly enriched in renal system development and kidney development GO terms. It should be noted that in each case 70% of the enriched genes are also vitamin A target genes (according to CTD or Balmer and Blomhoff).
CTSZ, which cleaves angiotensin I to yield angiotensin II, 61 is the only vitamin D target in the metabolism angiotensinogen to angiotensins pathway. Thus, within CAKUT, vitamin D might also have a role in renal tubular defects related to anomalies of the renin-angiotensin system.
WNT and NOTCH signaling pathways have genes that are targets of vitamin D but there is a small, non-significant overlap, and the overlapping genes are not the WNT and NOTCH receptors nor ligands.

Conclusions
In this study we examined the overlaps of vitamin A and vitamin D target gene sets extracted from different databases and publications with CAKUT-related genes and biological processes. While we only observed significant enrichment of vitamin D target genes in GO kidney development terms, there is significant enrichment of vitamin A target genes in most of the gene sets we tested. Indeed, we observed that vitamin A target genes are enriched in CAKUT causal genes set, CAKUT differentially expressed genes set, kidney development pathways, the signaling by NOTCH pathway, and the GDNF/RET signaling axis pathway. There is also an overlap with the signaling by the WNT pathway, although it is not statistically significant. Overall, the significant overlaps observed between vitamin A and CAKUT gene sets indicate that vitamin A might be relevant to a broad range of urogenital abnormalities, pointing out the importance of nutritional management during pregnancy for fetal development. However, it should be clear that our study is dedicated to the generation of hypotheses. Additional studies are needed, using up-to-date datasets obtained with new omics techniques, to investigate the fine-grained effects of vitamin excess and deficiency on CAKUT, and to decipher the pathophysiological mechanisms. The increased availability of multi-omics technologies, such as whole genome sequencing and metabolomics will certainly open new directions, e.g. by revealing new causal genes or identifying metabolic disturbances due to environmental factors.

Data availability
Underlying data Zenodo: Overlap of vitamin A and vitamin D target genes with CAKUT-related processes. https://www.doi.org/10. 5281/ zenodo.4501623. 32 This project contains the following underlying data in the 'Data' folder: • VitA-CTD-Genes.txt (list of vitamin A target genes from CTD used for overlap analyses).
• VitA-Balmer2002-Genes.txt (list of vitamin A target genes from Balmer and Blomhoff used for overlap analyses).
• VitD-CTD-Genes.txt (list of vitamin D target genes from CTD used for overlap analyses).
• VitD-Ramagopalan2010.txt (list of vitamin D target genes from Ramagopalan et al. used for overlap analyses).
• PathwaysOfInterestBackground.txt (list indicating which background GMT file to be used for the analysis of each CAKUT-related gene set).
• hsapiens.GO:BP.name.gmt (all gene sets from GO:BP used as background information for overlap analyses).
• hsapiens.REAC.name.gmt (all gene sets from Reactome used as background information for overlap analyses).
• hsapiens.WP.name.gmt (all gene sets from WikiPathways used as background information for overlap analyses).
This project contains the following underlying data in the 'Result' folder: • VitA-CTD-Genes.csv (overlap analysis results for vitamin A targets from CTD).
• VitD-Merged.csv (merged table for the results presented in VitD-CTD-Genes.csv and VitD-Ramagopalan2010. csv). This project contains the following extended data:

Extended data
• overlapAnalysis.py (Python script used for all the analyses).
This project contains the following extended data in the 'Supp' folder: • CAKUT-CausalGenesPMID29079659-PMID30143558.txt (list of genes known to be mutated in the monogenic form of CAKUT).
• targetsFromDifferentSourcesOverlap.odt (contains the information on vitamin A and D targets from different sources (CTD, publications), presents source specific genes and common genes).
This project contains the following extended data in the 'Supp/CAKUTCausalGenesOverlapWithOthers' folder: • CAKUTCausalGenesOverlap.csv (overlap of CAKUT causal genes with other CAKUT-related gene sets without considering whether genes are vitamin targets or not).
This project contains the following extended data in the 'Supp/RandomizedSampling' folder: • VitA-CTD-Genes-RS.csv (overlap analysis results for vitamin A targets from CTD using the randomized sampling approach) • VitA-Balmer2002-Genes-RS.csv (overlap analysis results for vitamin A targets from Balmer and Blomhoff using the randomized sampling approach -please see answer to the reviewers).
• VitD-CTD-Genes-RS.csv (overlap analysis results for vitamin D targets from CTD using the randomized sampling approach). •

Is the study design appropriate and is the work technically sound?
The addition of targeted vitamin A/retinoic acid and D gene sets is appreciated. Limiting CTD exported gene sets to those with at least 2 references is a reasonable approach to reducing potential false positives. I agree with the authors that this study is currently hypothesis generating. Given that patient CAKUT gene sets already exist and are publicly available (Jovanich et al., 2018, GEO dataset GSE GSE83946) 1 , the authors should also assess the enrichment of their gene sets against differentially expressed genes from patients with CAKUT and healthy control reference tissue.

If applicable, is the statistical analysis and its interpretation appropriate?
Because the gene sets are substantially different in size, a direct comparison of enrichment and pvalues can be misleading. I would suggest the authors generate a randomized gene set of equivalent size to each of the vitamin A and vitamin D gene sets and perform comparative enrichment analyses on the relevant pathways.
While not necessary, the addition of an odds ratio from 2x2 contingency matrices to assess the strength of enrichment would enhance the findings.
The targeted enrichment analysis of kidney and CAKUT related pathways is a reasonable approach to demonstrating a positive finding, ie. the gene sets identified associated with some of the expected/hypothesized pathways, it would be interesting to see if a more unbiased enrichment approach might also highlight other vitamin A and D relevant pathways, and perhaps additional renal-associated pathways not readily apparent from the targeted approach. This which could also substantiate the authors' vitamin A and D curation efforts.

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
presented the results. We observed that vitamin A targets obtained from CTD significantly overlapped with the differentially expressed genes.

Because the gene sets are substantially different in size, a direct comparison of enrichment and p-values can be misleading. I would suggest the authors generate a randomized gene set of equivalent size to each of the vitamin A and vitamin D gene sets and perform comparative enrichment analyses on the relevant pathways.
We used hypergeometric tests in order to take into account the sizes of gene sets. Indeed, these tests are used in enrichment analyses to assess sets of genes' overrepresentation in annotation terms of different sizes. However, we do agree that the experiment proposed by the reviewer is interesting to confirm our observations. We hence additionally performed a randomized sampling analysis.
We selected a set of random genes having the same size as the target genes set from the background genes list. We repeated this selection 2000 times. We counted the number of times that the random set had better overlap with the CAKUT-related gene set than the target genes set.
The results are consistent with the hypergeometric tests, except for two comparisons in which the overlaps were not significant according to the hypergeometric test but are significant according to the randomized sampling approach. Indeed, the adjusted p-value for the overlap between vitamin A targets retrieved from CTD and Reactome Signaling by WNT term is 0.146 (>0.05, insignificant) in the hypergeometric test but it is 0.0426 (<0.05, significant) in randomized sampling. Additionally, the adjusted p-value for the overlap between vitamin D targets retrieved from CTD and GO kidney morphogenesis term is 0.0507 (>0.05, insignificant) in the hypergeometric test but it is 0.042 (<0.05, significant) in randomized sampling. We should note that the results of the randomized sampling approach may change in different runs.
As this is an extra test for confirmation, we provided related documents in Underlying data and Extended data for the interested readers. The analysis process is described in "Supp/SupplementaryAnalyses.odt". The code of the randomized sampling test is available as part of overlapAnalysis.py; the results are in the "Supp/RandomizedSampling" folder.
While not necessary, the addition of an odds ratio from 2x2 contingency matrices to assess the strength of enrichment would enhance the findings.
All the numbers necessary to compute the enrichments (gene set sizes, domain sizes, intersection sizes, p-values) are provided in the tables. We think that providing additional tables and metrics might be confusing.
The targeted enrichment analysis of kidney and CAKUT related pathways is a reasonable approach to demonstrating a positive finding, ie. the gene sets identified associated with some of the expected/hypothesized pathways, it would be interesting to see if a more unbiased enrichment approach might also highlight other vitamin A and D relevant pathways, and perhaps additional renal-associated pathways not readily apparent from the targeted approach. This which could also substantiate the authors' vitamin A and D curation efforts.
We agree that our initial analyses could be extended. As a matter of fact, we are currently extending our approach from CAKUT-associated pathways to the available rare disease pathways. However providing all the enrichment results for all the pathways without any interpretation will not be useful for the audience. As we provide the vitamin target gene lists and the code to reproduce or extend our work, the interested (and expert) researchers with a precise question in mind could perform the analyses. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Olivia Angelin-Bonnet
School of Fundamental Sciences, College of Sciences, Massey University, Palmerston North, New Zealand The article "Overlap of vitamin A and vitamin D target genes with CAKUT-related processes" investigates the overlap between vitamin A and D target genes and genes associated with congenital anomalies of the kidney and urinary tract (CAKUT). More specifically, the authors assessed the significance of the intersection between sets of genes identified by previous publications or through databases as targets of either vitamin A or D, and sets of genes identified as causal to CAKUT or associated with pathways of interest with respect to CAKUT. This review focuses on the statistical analysis presented in this paper.
I find the use of a hypergeometric test for this analysis completely appropriate. However, I found that while the Methods section provides a good explanation of the construction of the different gene sets investigated, there is no presentation or discussion of the statistical test used to assess the significance of the overlap between different gene sets. More specifically, I would like to see a brief explanation of the hypergeometric test, in order to justify its use to the user, as well as a statement of its null hypothesis. In addition, there is no mention of the background sets used for the test. I think they should be explicitly mentioned, especially since they differ with the pathway of interest tested (e.g. all GO terms for GO-related gene sets, all Reactome genes for Reactome sets, etc) and are used to constrain the sets of vitamin-related genes (i.e. how many vitamin genes were discarded because they did not appear in the background set?). I also want to note that the availability of the data and the (clearly commented) python script was really appreciated. It ensures that the statistical analysis can be fully reproduced.
Lastly, a minor presentation point: I am not sure that having the p-value between brackets in the same cell as the list of interesting genes in the two tables is very clear, especially when the intersection set is empty and the resulting p-value is 1. I would suggest to add "p-value = " before the actual value to make it clearer.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Yes construction of the different gene sets investigated, there is no presentation or discussion of the statistical test used to assess the significance of the overlap between different gene sets. More specifically, I would like to see a brief explanation of the hypergeometric test, in order to justify its use to the user, as well as a statement of its null hypothesis. In addition, there is no mention of the background sets used for the test. I think they should be explicitly mentioned, especially since they differ with the pathway of interest tested (e.g. all GO terms for GO-related gene sets, all Reactome genes for Reactome sets, etc) and are used to constrain the sets of vitamin-related genes (i.e. how many vitamin genes were discarded because they did not appear in the background set?). I also want to note that the availability of the data and the (clearly commented) python script was really appreciated. It ensures that the statistical analysis can be fully reproduced.
We thank the reviewer for pointing out the lack of necessary information on the applied test. We added the explanations for the statistical test, the null hypothesis, and the background sets used in the tests to the "Methods" section under "Overlap analyses" heading.
"We computed the overlaps between vitamin target gene sets and the CAKUT gene sets of interest defined previously. After obtaining the overlap results, we calculated the significance of the overlaps using the hypergeometric test . Hypergeometric test calculates the statistical significance of getting k successes in n draws without replacement from a finite population of size N that contains a total of K successes. In our context, k is the number of genes overlapping with a specific gene set, K is the size of that gene set, n is the number of vitamin targets, and N is the background gene set (also known as the reference set). We set N to the number of annotated genes in the database from which we obtained the tested gene set. For the CAKUT causal genes set and the differentially expressed genes set, we used the largest N among the different databases, which corresponds to Gene Ontology. The null hypothesis is that the vitamin target genes are not associated with the gene set and the selection of k genes from that gene set is just a result of simple random sampling. We performed a hypergeometric test for all the gene sets. A consequence of multiple testing is false discoveries in which null hypothesis is rejected erroneously. To control for the false discovery rate, we used the Benjamini-Hochberg correction method (BH adjusted)." I would suggest to add "p-value = " before the actual value to make it clearer.
As suggested, we added "p-value = " before the actual value.

© 2021 Simons M.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Matias Simons
Laboratory of Epithelial Biology and Disease, Imagine Institute, Université Paris Descartes-Sorbonne, Paris, France I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Author Response 15 Apr 2022
Ozan Ozisik, Aix Marseille University, France We thank the reviewer for the valuable comments. We present the reviewer comments, our answers to them, and quoted texts from the revised manuscript below.
Given that a list of 1086 Vit A target genes but only of 263 Vit D target genes, it is not surprising that less overlap is found for Vit D. This should be commented on.
Considering the overlaps between gene lists, as stated, with less target genes, less genes are expected to overlap with the CAKUT-related gene sets. However, in the manuscript, we do not focus on the exact number of overlapping genes. Indeed, in order to take into account the variable sizes of the different datasets, we used the hypergeometric test. Our focus is hence on the significance of the overlaps considering the dataset sizes, i.e. the observed overlaps as compared to the expected overlaps given the dataset sizes. This means that a small number of overlapping genes could lead to significance for vitamin D overlap with CAKUT-relative genes. However, we did not observe many significant overlaps. The overlaps between vitamin A targets and CAKUT-related gene sets were greater than expectations, which is the reason they are statistically significant. In the revised manuscript we added the explanations for the statistical test.
It is said in abstract and introduction that only 5% to 20% of CAKUT cases have a monogenic origin. However, it has to be noted that these numbers are only based on whole exome sequencing studies. It is to be expected that whole genome sequencing will discover new monogenic causes.
We used a list of causal genes curated from multiple studies. We are not sure if they are all based on whole exome sequencing. However, we agree that whole genome sequencing will help discovering new monogenic causes. We hence added a statement related to the whole genome sequencing to the conclusion: "The increased availability of multi-omics technologies, such as whole genome sequencing and metabolomics will certainly open new directions, e.g. by revealing new causal genes or identifying metabolic disturbances due to environmental factors." The discussion could include more perspectives on how new -omic techniques could be used in the future.
We agree that we do expect a lot of developments in this area, including whole genome sequencing but also single-cell or spatial transcriptomics, on human, animal models or organoids. In order to mention all these potential technologies while keeping the text focused, we added the following sentence to the conclusion: The manuscript "Overlap if vitamin A and vitamin D target genes with CAKUT-related processes" is aimed to indirectly evaluate the possible relationship between Vitamin A and D intake imbalance and congenital anomalies of the kidney and urinary tract (CAKUT). The work considers genes reported as dysregulated in CAKUT and key genes involved in different urinary morphogenetic pathways. Both were overlapped with target genes of Vitamin A and Vitamin D (extracted from CTD and from literature: Balmer and Blomhoff, 2002 for Vitamin A, Ramagopalan et al., 2010 for Vitamin D). As expected considering the biological activity of the Vitamin A morphogenetic derivative (retinoid acid), significant overlaps are reported in the manuscript between Vitamin A target genes and genes involved in several urinary morphogenetic processes and signalling. Interestingly, significant overlaps were observed between Vitamin A target genes and CAKUT-related genes. As far as Vitamin D is concerned, overlappings were limited to some kidney morphogenetic genes but no overlapping was evidenced with CAKUT-related genes.
In general, this work can be considered of interest in order to postulate a possible relationship between Vitamin imbalance (mainly Vitamin A) and CAKUT and to suggest that a nutritional evaluation in pregnancy is essential to ensure a physiological development and maternal/fetal health.
Some suggestions in order to improve the impact of the manuscript:

Abstract:
The abstract should be rewritten in order to better clarify aim and results of the work. In particular, Vitamin D results are lacking.

Introduction:
The aim should be better explained. Methods: Indicate retinoic acid as "the active metabolite of vitamin A" instead of "vitamin A acid". Better explain the statistical analysis.

Conclusions:
Can be rewritten in order to better focus the relevance of the nutritional management during pregnancy in order to ensure both maternal and fetal health. A general consideration should be introduced, considering also the pleiotropic functions of the involved genes.

References:
Please check references, i.e. reference 54 seems not pertinent.

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate? I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Partly
Competing Interests: No competing interests were disclosed. We thank the reviewer for the detailed summary and valuable comments. We present the reviewer comments, our answers to them, and quoted texts from the revised manuscript below.

Abstract
The abstract should be rewritten in order to better clarify aim and results of the work. In particular, Vitamin D results are lacking.
We revised the abstract; we clarified our aim and added the results for vitamin D. We clarified the aim of the study in the introduction.
"In this study, we examined whether vitamin A and vitamin D target genes are related to gene sets involved in CAKUT and renal system development. Our aim is to generate hypotheses for the involvement of these vitamins in the disease aetiology." We used additional datasets from two publications to have independent sources for vitamin target information. Only a small subset of the articles (27 of 1192) reviewed by Balmer and Blomhoff (2002) are reviewed in CTD, and Ramagopalan et al. (2010) is not reviewed in CTD. We clarified this in the methods section.
"As an independent source, we used the data from the study of Balmer and Blomhoff. 30 In this study, the authors reviewed published data from 1191 articles to identify genes regulated by retinoic acid (the active metabolite of vitamin A) in humans and other species. Only a small subset of these articles are part of the CTD curation." "Ramagopalan et al. 31 identified 230 genes with significant expression changes in response to vitamin D on lymphoblastoid cell lines using microarrays. This publication is not part of the CTD curation, hence it brings additional and independent information."

Methods
Indicate retinoic acid as "the active metabolite of vitamin A" instead of "vitamin A acid". Better explain the statistical analysis.
We changed "vitamin A acid" to "the active metabolite of vitamin A".

"In this study, the authors reviewed published data from 1191 articles to identify genes regulated by retinoic acid (the active metabolite of vitamin A) in humans and other species."
We added the description of the statistical analysis.
"We computed the overlaps between vitamin target gene sets and the CAKUT gene sets of interest defined previously. After obtaining the overlap results, we calculated the significance of the overlaps using the hypergeometric test . Hypergeometric test calculates the statistical significance of getting k successes in n draws without replacement from a finite population of size N that contains a total of K successes. In our context, k is the number of genes overlapping with a specific gene set, K is the size of that gene set, n is the number of vitamin targets, and N is the background gene set (also known as the reference set). We set N to the number of annotated genes in the database from which we obtained the tested gene set. For the CAKUT causal genes set and the differentially expressed genes set, we used the largest N among the different databases, which corresponds to Gene Ontology. The null hypothesis is that the vitamin target genes are not associated with the gene set and the selection of k genes from that gene set is just a result of simple random sampling. We performed a hypergeometric test for all the gene sets. A consequence of multiple testing is false discoveries in which null hypothesis is rejected erroneously. To control for the false discovery rate, we used the Benjamini-Hochberg correction method (BH adjusted)."

Results
Tables are not fully self explicative. I suggest: a separate column for p-value; ○ Adding a new column to the table for p-values causes difficulty in layout. We hence added "p-value =" before each value in the parentheses to clarify the table. write in bold the genes both overlapping with CAKUT and urinary pathways/signalling.

○
The overlap of CAKUT causal genes with other gene sets can be interesting for the reader. However, as the aim of the tables in the manuscript is to present overlap of vitamin target genes and CAKUT related gene sets, presenting this extra information might be confusing for the reader. Additionally, the information of overlap between CAKUT causal genes and other pathways will be incomplete and could be misleading as there might be other genes which are common to these but are not vitamin targets. For these reasons, we now provide the overlap of CAKUT causal genes with all the other pathways (without considering whether genes are vitamin targets or not) in a supplementary dataset. This analysis is described in "Supp/SupplementaryAnalyses.odt", the code for the calculation of this overlap is available as part of overlapAnalysis.py, and the overlap table is presented in "Supp/CAKUTCausalGenesOverlapWithOthers/CAKUTCausalGenesOverlap.csv" in Underlying data and Extended data.

Moreover, (1) in table 2 is not explained.
"(1)"s in table 2 were p-values of 1. We hope that with the addition of "p-value =", these pvalues are clearer now.

Looking in detail at Vitamin A table, four CAKUT-causal genes (BMP4, HNF1B, NRIP1 and RET, overlapping both with CTD and Balmer and Blomhoff) appear as a leitmotiv in different vitamin-A related morphogenetic pathways. In my personal opinion, this result should be stressed and a brief description of functions of BMP4, HNF1B, NRIP1 and RET in kidney and urinary tract development introduced in Results and/or Conclusions.
We agree that these are indeed interesting genes to highlight. We stressed that BMP4, HNF1B, NRIP1 and RET, which are CAKUT causal genes, are vitamin A targets according to both CTD and Balmer and Blomhoff. We added a description of their roles.
"Four genes (BMP4, HNF1B, RET, NRIP1) appear as particularly interesting as they are CAKUT causal genes and vitamin A targets according to both CTD and Balmer and Blomhoff. BMP4 is a growth factor of the TGF-beta family involved in a wide variety of developmental processes. 48 BMP4 inhibits ectopic budding of the ureteric tips and promotes elongation of the ureter. 49 RET is a tyrosine-protein kinase receptor, involved in a large variety of cellular processes. RET signaling is important, in particular, for the terminal growth of the nephric duct, and for initial formation and outgrowth of the ureteric bud. 50 The Hepatocyte Nuclear Factor 1-beta (HNF1B) is a transcription factor involved in ureteric bud branching and induction of nephrogenesis. It is required for normal S-shaped bodies patterning and subsequent morphogenesis of all nephron segments. 51 Finally, Nuclear receptor interacting protein 1 (NRIP1) modulates the activity of nuclear receptors. It has been shown to play roles in renal malformations via deregulations of retinoic acid signaling. 52 "

Conclusions:
Can be rewritten in order to better focus the relevance of the nutritional management during pregnancy in order to ensure both maternal and fetal health. A general consideration should be introduced, considering also the pleiotropic functions of the involved genes.